{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given the model setup:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A paper's true value:\n",
    "\n",
    "$$y^{(pr)} \\sim \\mathcal{N}(\\mu_p, \\sigma_p^2)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A reviewer's bias:\n",
    "\n",
    "$$z^{(pr)} \\sim \\mathcal{N}(\\nu_r, \\tau_r^2)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A reviewer's score for a given paper:\n",
    "\n",
    "$$x^{(pr)} | y^{(pr)}, z^{(pr)} \\sim \\mathcal{N}(y^{(pr)} + z^{(pr)}, \\sigma^2 )$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Independency**: The variables $y^{(pr)}$ and $z^{(pr)}$ are independent; the variables $(x, y, z)$ for different paper-reviewer pairs are also jointly independent."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# (a) E-step"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### (i)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Hint from the definition of the problem:\n",
    "\n",
    "$$x^{(pr)} = y^{(pr)} + z^{(pr)} + \\epsilon^{(pr)} \\text{,  where  } \\epsilon \\sim \\mathcal{N}(0, \\sigma^2)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "so $x^{(pr)}$ follows [a normal distribution that is the sum of multiple independent noraml distributions](https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables):\n",
    "\n",
    "$$x^{(pr)} \\sim \\mathcal{N}(\\mu_p + \\nu_r, \\sigma_p^2 + \\tau_r^2 + \\sigma^2 )$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the joint distribution $p(y^{(pr)}, z^{(pr)}, x^{(pr)})$, its mean vector ($mu_{pr}$, use $m$ because $\\mu$ has already been used) and covariance matrix ($\\Sigma_{pr}$) are\n",
    "\n",
    "\\begin{align*}\n",
    "m_{pr} &= [\\mu_p, \\nu_r, \\mu_p + \\nu_r]^T\n",
    "\\\\\n",
    "\\Sigma_{pr} &= \n",
    "\\begin{bmatrix}\n",
    "\\sigma_p^2 & 0        & \\sigma_p^2 \\\\ \n",
    "0          & \\tau_r^2 & \\tau_r^2  \\\\ \n",
    "\\sigma_p^2 & \\tau_r^2 & \\sigma_p^2 + \\tau_r^2 + \\sigma^2\n",
    "\\end{bmatrix}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note: $\\mathrm{cov}(y^{(pr)}, x^{(pr)})$ and $\\mathrm{cov}(z^{(pr)}, x^{(pr)})$ are derived based on the following. If normally distributed random variables $A$ and $B$ are independent, then $\\mathrm{cov}(A, A+B) = \\mathrm{cov}(A, A) = \\sigma_A$.\n",
    "\n",
    "Proof:\n",
    "\n",
    "\\begin{align*}\n",
    "\\mathrm{cov}(A, A+B) \n",
    "&= E[(A - E[A])(A+B - E[A+B])] \\\\\n",
    "&= E[(A - E[A])(A - E[A] + B - E[B])] \\\\\n",
    "&= E[(A - E[A])(A - E[A]) + (A - E[A])(B - E[B]))] \\\\\n",
    "&= E[(A - E[A])(A - E[A])] \\\\\n",
    "&= \\sigma_A^2 \\\\\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Therefore, following a trivariate normal distribution,\n",
    "\n",
    "\\begin{align*}\n",
    "p(x^{(pr)}, y^{(pr)}, z^{(pr)}; \\mu_p, \\nu_r, \\sigma_p^2, \\tau_r^2)\n",
    "&= \\frac{1}{(2\\pi)^{2/3} \\left|\\Sigma_{pr}\\right|^{1/2}} \\exp \\bigg(\\frac{1}{2}(a^{(pr)} - m_{pr})^T \\Sigma_{pr}^{-1} (a^{(pr)} - m_{pr}) \\bigg)\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "where $a^{(pr)} = [y^{(pr)}, z^{(pr)}, x^{(pr)}]^T$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### (ii)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "About conditional multivariate normal distribution from the lecture notes on factor analysis, http://cs229.stanford.edu/notes/cs229-notes9.pdf.\n",
    "\n",
    "Let $x_1$ and $x_2$ be two multivariate normal random variables,\n",
    "\n",
    "\\begin{align*}\n",
    "x_1 &\\sim \\mathcal{N}(\\mu_1, \\Sigma_{11}) \\\\\n",
    "x_2 &\\sim \\mathcal{N}(\\mu_2, \\Sigma_{22})\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, let $x$ be a new multivariate random variable after stacking $x_1$ and $x_2$, "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "x &= [x_1, x_2]^T \\sim \\mathcal{N}([\\mu_1, \\mu_2]^T, \\Sigma)\n",
    "\\\\\n",
    "\\Sigma &= \n",
    "\\begin{bmatrix}\n",
    " \\Sigma_{11} & \\Sigma_{12} \\\\ \n",
    " \\Sigma_{21} & \\Sigma_{22} \n",
    "\\end{bmatrix}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note $x_1$, $x_2$, $\\mu_1$, $\\mu_2$, and $\\Sigma_{ij}$ can not only be scalars, but also vectors/submatrices."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then for conditional random variable, $x_1|x_2 \\sim \\mathcal{N}(\\mu_{1|2}, \\Sigma_{1|2})$,\n",
    "\n",
    "\\begin{align*}\n",
    "\\mu_{1|2}    &= \\mu_1 + \\Sigma_{12}\\Sigma_{22}^{-1}(x_2 - \\mu_2) \\\\\n",
    "\\Sigma_{1|2} &= \\Sigma_{11} + \\Sigma_{12}\\Sigma_{22}^{-1}\\Sigma_{21}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we derive the expression for $Q_{pr}(y^{(pr)}, z^{(pr)}) = p(y^{(pr)}, z^{(pr)} | x^{(pr)})$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, figure out all the corresponding variables\n",
    "\n",
    "\\begin{align*}\n",
    "\\mu_1 &= [\\mu_p, \\nu_r]^T\n",
    "\\\\\n",
    "\\Sigma_{12} &= [\\sigma_p^2, \\tau^2]^T \\\\\n",
    "\\Sigma_{22}^{-1} &= \\frac{1}{\\sigma_p^2 + \\tau_r^2 + \\sigma^2} \\\\\n",
    "x_2 &= x^{(pr)} \\\\\n",
    "\\mu_2 &= \\mu_p + \\nu_r \\\\\n",
    "\\Sigma_{11} &= \\begin{bmatrix}\n",
    "\\sigma_p^2 & 0        \\\\ \n",
    "0          & \\tau_r^2 \\\\ \n",
    "\\end{bmatrix} \\\\\n",
    "\\Sigma_{21} &= [\\sigma_p^2, \\tau^2]\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\mu_{1|2} &=\n",
    "\\begin{bmatrix}\n",
    "\\mu_p \\\\\n",
    "\\nu_r \\\\\n",
    "\\end{bmatrix} + \\frac{x^{(pr)} - \\mu_p - \\nu_r}{\\sigma_p^2 + \\tau_r^2 + \\sigma^2} \n",
    "\\begin{bmatrix}\n",
    "\\sigma_p^2 \\\\\n",
    "\\tau^2 \\\\\n",
    "\\end{bmatrix}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\Sigma_{1|2} &= \\begin{bmatrix}\n",
    "\\sigma_p^2 & 0        \\\\ \n",
    "0          & \\tau_r^2 \\\\ \n",
    "\\end{bmatrix} - \\begin{bmatrix}\n",
    "\\sigma_p^2 \\\\\n",
    "\\tau^2 \\\\\n",
    "\\end{bmatrix} \\frac{1}{\\sigma_p^2 + \\tau_r^2 + \\sigma^2} [\\sigma_p^2, \\tau^2] \n",
    "\\\\\n",
    "&= \\begin{bmatrix}\n",
    "\\sigma_p^2 & 0        \\\\ \n",
    "0          & \\tau_r^2 \\\\ \n",
    "\\end{bmatrix} - \\frac{1}{\\sigma_p^2 + \\tau_r^2 + \\sigma^2} \\begin{bmatrix}\n",
    "\\sigma_p^4        & \\sigma_p^2 \\tau^2 \\\\\n",
    "\\tau^2 \\sigma_p^2 & \\tau^4            \\\\\n",
    "\\end{bmatrix} \n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "Q_{pr}(y^{(pr)}, z^{(pr)})\n",
    "&= p(y^{(pr)}, z^{(pr)} | x^{(pr)}) \\\\\n",
    "&= \\frac{1}{\\sqrt{2\\pi}\\left|\\Sigma_{1|2}\\right|} \\exp{\\bigg(-\\frac{1}{2} \\bigg(\n",
    "\\begin{bmatrix}\n",
    "y^{(pr)} \\\\\n",
    "z^{(pr)} \\\\\n",
    "\\end{bmatrix}\n",
    " - \\mu_{1|2}\\bigg)^T\\Sigma_{1|2}^{-1} \\bigg(\n",
    "\\begin{bmatrix}\n",
    "y^{(pr)} \\\\\n",
    "z^{(pr)} \\\\\n",
    "\\end{bmatrix}\n",
    " - \\mu_{1|2} \\bigg) \n",
    "\\bigg)} \\\\\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$Q_{pr}$ follows a bivariate normal distribution."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Warning*: math is tricky here. If you read all the way to here, thank you! Please let me know if my derivation is wrong."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# (b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At the E-step, we calculated\n",
    "\n",
    "\\begin{align*}\n",
    "w_{(y^{(pr)}, z^{(pr)})}= Q_{pr}(y^{(pr)}, z^{(pr)})\n",
    "\\end{align*}\n",
    "\n",
    "based on the equation from **(c)**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then the lower bound for log likelihood:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "l(\\mu_p, \\nu_r, \\sigma_p^2, \\tau_r^2)  \n",
    "&= \n",
    "\\sum_{p=1}^{P} \\sum_{r=1}^{R} \\sum_{(y,z)}Q_{pr}(y^{(pr)}, z^{(pr)})\\log \\frac{p(y^{(pr)}, z^{(pr)}, x^{(pr)})}{Q_{pr}(y^{(pr)}, z^{(pr)})}\n",
    "\\\\ \n",
    "&=\n",
    "\\sum_{p=1}^{P} \\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(pr)}, z^{(pr)})} \\log \\frac{p(y^{(pr)}, z^{(pr)}, x^{(pr)})}{w_{(y^{(pr)}, z^{(pr)})}}\n",
    "\\\\\n",
    "&=\n",
    "\\sum_{p=1}^{P} \\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(pr)}, z^{(pr)})} \\log \\frac{\\frac{1}{(2\\pi)^{2/3} \\left|\\Sigma_{pr}\\right|^{1/2}} \\exp \\bigg(-\\frac{1}{2}(a^{(pr)} - m_{pr})^T \\Sigma_{pr}^{-1} (a^{(pr)} - m_{pr}) \\bigg)}{w_{(y^{(pr)}, z^{(pr)})}} \\\\\n",
    "&= \\sum_{p=1}^{P} \\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(pr)}, z^{(pr)})}\\bigg (\\log \\frac{1}{(2\\pi)^{2/3} \\left|\\Sigma_{pr}\\right|^{1/2}} - \\frac{1}{2}(a^{(pr)} - m_{pr})^T \\Sigma_{pr}^{-1} (a^{(pr)} - m_{pr}) - \\log w_{(y^{(pr)}, z^{(pr)})} \\bigg ) \\\\\n",
    "\\end{align*}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note: $(y, z)$ are treated as a single variable to make it easy to track the summation as suggested in the problem description."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For reference, I listed all the involved parts in the above equation,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "a^{(pr)} &= [y^{(pr)}, z^{(pr)}, x^{(pr)}]^T\n",
    "\\\\\n",
    "m_{pr} &= [\\mu_p, \\nu_r, \\mu_p + \\nu_r]^T\n",
    "\\\\\n",
    "\\Sigma_{pr} &=\n",
    "\\begin{bmatrix}\n",
    "\\sigma_p^2 & 0        & \\sigma_p^2 \\\\ \n",
    "0          & \\tau_r^2 & \\tau_r^2  \\\\ \n",
    "\\sigma_p^2 & \\tau_r^2 & \\sigma_p^2 + \\tau_r^2 + \\sigma^2\n",
    "\\end{bmatrix}\n",
    "\\\\\n",
    "\\left| \\Sigma_{pr} \\right| &= \\sigma_p^2 \\tau_r^2 \\sigma^2 \n",
    "\\\\\n",
    "C &=\n",
    "\\begin{bmatrix}\n",
    "\\tau_r^2 (\\sigma_p^2 + \\sigma^2) & \\sigma_p^2 \\tau_r^2              & -\\sigma_p^2 \\tau_r^2 \\\\ \n",
    "\\sigma_p^2 \\tau_r^2              & \\sigma_p^2 (\\tau_r^2 + \\sigma^2) & -\\sigma_p^2 \\tau_r^2 \\\\ \n",
    "- \\sigma_p^2 \\tau_r^2            & - \\sigma_p^2 \\tau_r^2            & -\\sigma_p^2 \\tau_r^2 \\\\\n",
    "\\end{bmatrix}\n",
    "\\\\\n",
    "\\Sigma_{pr}^{-1} = \\frac{1}{\\left|\\Sigma_{pr}\\right|}C &= \n",
    "\\begin{bmatrix}\n",
    "\\frac{1}{\\sigma^2} + \\frac{1}{\\sigma_p^2} & \\frac{1}{\\sigma^2}              & -\\frac{1}{\\sigma^2} \\\\ \n",
    "\\frac{1}{\\sigma^2}              & \\frac{1}{\\sigma^2} + \\frac{1}{\\tau_r^2}   & -\\frac{1}{\\sigma^2} \\\\ \n",
    "- \\frac{1}{\\sigma^2}            & - \\frac{1}{\\sigma^2}                                & -\\frac{1}{\\sigma^2} \\\\\n",
    "\\end{bmatrix}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "where $C$ is the cofactor matrix"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### To update the true value of the $i$th paper, $\\mu_i$, maximize the likehoold with respect to $\\mu_i$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Warning: Hairy math. Please let me know if I am wrong, or if there is an easier way to do so."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\frac{\\partial l}{\\partial {\\mu_i}} \n",
    "&= \\frac{\\partial}{\\partial \\mu_i} \\sum_{p=1}^{P} \\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(pr)}, z^{(pr)})}\\bigg (\\log \\frac{1}{(2\\pi)^{2/3} \\left|\\Sigma_{pr}\\right|^{1/2}} - \\frac{1}{2}(a^{(pr)} - m_{pr})^T \\Sigma_{pr}^{-1} (a^{(pr)} - m_{pr}) - \\log w_{(y^{(pr)}, z^{(pr)})} \\bigg ) \\\\\n",
    "&= \\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} (\\frac{\\partial m_{ir}}{\\partial \\mu_i})^T \\Sigma_{ir}^{-1}(a^{(ir)} - m_{ir})\n",
    "\\\\\n",
    "&= \\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} [1, 0, 1]\\Sigma_{ir}^{-1}(a^{(ir)} - m_{ir})\n",
    "\\\\\n",
    "&= \\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} [\\frac{1}{\\sigma_i^2}, 0, -\\frac{2}{\\sigma^2}](a^{(ir)} - m_{ir})\n",
    "\\\\\n",
    "&= \\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} [\\frac{1}{\\sigma_i^2}, 0, -\\frac{2}{\\sigma^2}]a^{(ir)} - \\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} [\\frac{1}{\\sigma_i^2}, 0, -\\frac{2}{\\sigma^2}]m_{ir}\n",
    "\\\\\n",
    "&= \\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} [\\frac{1}{\\sigma_i^2}, 0, -\\frac{2}{\\sigma^2}] \n",
    "\\begin{bmatrix}\n",
    "y^{(ir)} \\\\\n",
    "z^{(ir)} \\\\\n",
    "x^{(ir)}\n",
    "\\end{bmatrix}\n",
    "- \\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} [\\frac{1}{\\sigma_i^2}, 0, -\\frac{2}{\\sigma^2}]\n",
    "\\begin{bmatrix}\n",
    "\\mu_i \\\\\n",
    "\\nu_r \\\\\n",
    "\\mu_i + \\nu_r\n",
    "\\end{bmatrix}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Setting it to zero, we get\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} [\\frac{1}{\\sigma_i^2}, 0, -\\frac{2}{\\sigma^2}] \n",
    "\\begin{bmatrix}\n",
    "y^{(ir)} \\\\\n",
    "z^{(ir)} \\\\\n",
    "x^{(ir)}\n",
    "\\end{bmatrix}\n",
    "&= \\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} [\\frac{1}{\\sigma_i^2}, 0, -\\frac{2}{\\sigma^2}]\n",
    "\\begin{bmatrix}\n",
    "\\mu_i \\\\\n",
    "\\nu_r \\\\\n",
    "\\mu_i + \\nu_r\n",
    "\\end{bmatrix}\n",
    "\\\\\n",
    "\\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} \\bigg(\\frac{y^{(ir)}}{\\sigma_i^2} - \\frac{2 x^{(ir)}}{\\sigma^2}\\bigg )\n",
    "&= \\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} \\bigg( \\frac{\\mu_i}{\\sigma_i^2} - \\frac{2(\\mu_i + \\nu_r)}{\\sigma^2} \\bigg)\n",
    "\\\\\n",
    "\\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} \\bigg(\\frac{y^{(ir)}}{\\sigma_i^2} - \\frac{2 (x^{(ir)} - \\nu_r)}{\\sigma^2}\\bigg)\n",
    "&= \\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} (\\frac{1}{\\sigma_i^2} - \\frac{2}{\\sigma^2})\\mu_i\n",
    "\\\\\n",
    "\\mu_i &= \\frac{\\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} \\bigg(\\frac{y^{(ir)}}{\\sigma_i^2} - \\frac{2 (x^{(ir)} - \\nu_r)}{\\sigma^2}\\bigg)}{\\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} (\\frac{1}{\\sigma_i^2} - \\frac{2}{\\sigma^2})}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### To update the variance of $\\mu_i$, i.e. $\\sigma_i^2$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\frac{\\partial l}{\\partial {\\sigma_i^2}} \n",
    "&= \\frac{\\partial}{\\partial \\sigma_i^2} \\sum_{p=1}^{P} \\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(pr)}, z^{(pr)})}\\bigg (\\log \\frac{1}{(2\\pi)^{2/3} \\left|\\Sigma_{pr}\\right|^{1/2}} - \\frac{1}{2}(a^{(pr)} - m_{pr})^T \\Sigma_{pr}^{-1} (a^{(pr)} - m_{pr}) - \\log w_{(y^{(pr)}, z^{(pr)})} \\bigg ) \\\\\n",
    "&= \\frac{\\partial}{\\partial \\sigma_i^2} \\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} \\bigg( -\\frac{1}{2} \\log\\left|\\Sigma_{ir}\\right| - \\frac{1}{2}(a^{(ir)} - m_{ir})^T \\Sigma_{ir}^{-1} (a^{(ir)} - m_{ir}) \\bigg) \n",
    "\\\\\n",
    "&= - \\frac{1}{2} \\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} \\bigg( \\frac{1}{\\left|\\Sigma_{ir}\\right|}\\frac{\\partial \\left|\\Sigma_{ir}\\right|}{\\partial \\sigma_i^2}  + (a^{(ir)} - m_{ir})^T \\frac{\\partial \\Sigma_{ir}^{-1}}{\\partial \\sigma_i^2} (a^{(ir)} - m_{ir}) \\bigg) \n",
    "\\\\\n",
    "&= - \\frac{1}{2} \\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} \\bigg( \\frac{1}{\\sigma_i^2}  + \\begin{bmatrix}\n",
    "y^{(ir)} - \\mu_i \\\\\n",
    "z^{(ir)} - \\nu_r \\\\\n",
    "x^{(ir)} - \\mu_i - \\nu_r \\\\\n",
    "\\end{bmatrix}^T\n",
    "\\begin{bmatrix}\n",
    "-\\frac{1}{\\sigma_i^4} & 0 & 0 \\\\\n",
    "0                     & 0 & 0 \\\\\n",
    "0                     & 0 & 0 \\\\\n",
    "\\end{bmatrix}\n",
    "\\begin{bmatrix}\n",
    "y^{(ir)} - \\mu_i \\\\\n",
    "z^{(ir)} - \\nu_r \\\\\n",
    "x^{(ir)} - \\mu_i - \\nu_r \\\\\n",
    "\\end{bmatrix} \\bigg)\n",
    "\\\\\n",
    "&= - \\frac{1}{2} \\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} \n",
    "\\bigg( \\frac{1}{\\sigma_i^2} - \\frac{1}{\\sigma_i^4}(y^{(ir)} - \\mu_i)^2 \\bigg)\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Setting it to zero, we get\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\sigma_i^2 = \\frac{\\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} (y^{(ir)} - \\mu_i)^2}{\\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})}}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### To update the bias of the $j$th reviewer, $\\nu_j$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\frac{\\partial l}{\\partial {\\nu_j}} \n",
    "&= \\frac{\\partial}{\\partial \\nu_j} \\sum_{p=1}^{P} \\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(pr)}, z^{(pr)})}\\bigg (\\log \\frac{1}{(2\\pi)^{2/3} \\left|\\Sigma_{pr}\\right|^{1/2}} - \\frac{1}{2}(a^{(pr)} - m_{pr})^T \\Sigma_{pr}^{-1} (a^{(pr)} - m_{pr}) - \\log w_{(y^{(pr)}, z^{(pr)})} \\bigg ) \\\\\n",
    "&= \\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} (\\frac{\\partial m_{pj}}{\\partial \\nu_j})^T \\Sigma_{pj}^{-1}(a^{(pj)} - m_{pj})\n",
    "\\\\\n",
    "&= \\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} [0, 1, 1]\\Sigma_{pj}^{-1}(a^{(pj)} - m_{pj})\n",
    "\\\\\n",
    "&= \\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} [0, \\frac{1}{\\tau_j^2}, -\\frac{2}{\\sigma^2}](a^{(pj)} - m_{pj})\n",
    "\\\\\n",
    "&= \\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} [0, \\frac{1}{\\tau_j^2}, -\\frac{2}{\\sigma^2}]a^{(pj)} - \\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} [0, \\frac{1}{\\tau_j^2}, -\\frac{2}{\\sigma^2}] m_{pj}\n",
    "\\\\\n",
    "&= \\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} [0, \\frac{1}{\\tau_j^2}, -\\frac{2}{\\sigma^2}]\n",
    "\\begin{bmatrix}\n",
    "y^{(pj)} \\\\\n",
    "z^{(pj)} \\\\\n",
    "x^{(pj)} \\\\\n",
    "\\end{bmatrix} - \\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} [0, \\frac{1}{\\tau_j^2}, -\\frac{2}{\\sigma^2}] \n",
    "\\begin{bmatrix}\n",
    "\\mu_p \\\\\n",
    "\\nu_j \\\\\n",
    "\\mu_p + \\nu_j \\\\\n",
    "\\end{bmatrix}\n",
    "\\end{align*}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Setting it to zero, we get\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} [0, \\frac{1}{\\tau_j^2}, -\\frac{2}{\\sigma^2}]\n",
    "\\begin{bmatrix}\n",
    "y^{(pj)} \\\\\n",
    "z^{(pj)} \\\\\n",
    "x^{(pj)} \\\\\n",
    "\\end{bmatrix} &=\n",
    "\\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} [0, \\frac{1}{\\tau_j^2}, -\\frac{2}{\\sigma^2}] \n",
    "\\begin{bmatrix}\n",
    "\\mu_p \\\\\n",
    "\\nu_j \\\\\n",
    "\\mu_p + \\nu_j \\\\\n",
    "\\end{bmatrix} \n",
    "\\\\\n",
    "\\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})}\\bigg(\\frac{z^{(pj)}}{\\tau_j^2} - \\frac{2x^{(pj)}}{\\sigma^2} \\bigg) &= \n",
    "\\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})}\\bigg(\\frac{\\nu_j}{\\tau_j^2} - \\frac{2(\\mu_p + \\nu_j)}{\\sigma^2} \\bigg)\n",
    "\\\\\n",
    "\\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})}\\bigg(\\frac{z^{(pj)}}{\\tau_j^2} - \\frac{2(x^{(pj)} - \\mu_p)}{\\sigma^2} \\bigg) &= \n",
    "\\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})}(\\frac{1}{\\tau_j^2} - \\frac{2}{\\sigma^2})\\nu_j\n",
    "\\\\\n",
    "\\nu_j &= \\frac{\\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})}\\bigg(\\frac{z^{(pj)}}{\\tau_j^2} - \\frac{2(x^{(pj)} - \\mu_p)}{\\sigma^2} \\bigg)}{\\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})}(\\frac{1}{\\tau_j^2} - \\frac{2}{\\sigma^2})}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### To update the variance of $\\nu_j$, i.e. $\\tau_r^2$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\frac{\\partial l}{\\partial {\\tau_j^2}} \n",
    "&= \\frac{\\partial}{\\partial \\tau_j^2} \\sum_{p=1}^{P} \\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(pr)}, z^{(pr)})}\\bigg (\\log \\frac{1}{(2\\pi)^{2/3} \\left|\\Sigma_{pr}\\right|^{1/2}} - \\frac{1}{2}(a^{(pr)} - m_{pr})^T \\Sigma_{pr}^{-1} (a^{(pr)} - m_{pr}) - \\log w_{(y^{(pr)}, z^{(pr)})} \\bigg ) \\\\\n",
    "&= \\frac{\\partial}{\\partial \\tau_j^2} \\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} \\bigg( -\\frac{1}{2} \\log\\left|\\Sigma_{pj}\\right| - \\frac{1}{2}(a^{(pj)} - m_{pj})^T \\Sigma_{pj}^{-1} (a^{(pj)} - m_{pj}) \\bigg) \n",
    "\\\\\n",
    "&= - \\frac{1}{2} \\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} \\bigg( \\frac{1}{\\left|\\Sigma_{pj}\\right|}\\frac{\\partial \\left|\\Sigma_{pj}\\right|}{\\partial \\tau_j^2}  + (a^{(pj)} - m_{pj})^T \\frac{\\partial \\Sigma_{pj}^{-1}}{\\partial \\tau_j^2} (a^{(pj)} - m_{pj}) \\bigg) \n",
    "\\\\\n",
    "&= - \\frac{1}{2} \\sum_{j=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} \\bigg( \\frac{1}{\\tau_j^2}  + \\begin{bmatrix}\n",
    "y^{(pj)} - \\mu_p \\\\\n",
    "z^{(pj)} - \\nu_j \\\\\n",
    "x^{(pj)} - \\mu_p - \\nu_j \\\\\n",
    "\\end{bmatrix}^T\n",
    "\\begin{bmatrix}\n",
    "0 & 0 & 0 \\\\\n",
    "0 & -\\frac{1}{\\tau_j^4} & 0 \\\\\n",
    "0 & 0 & 0 \\\\\n",
    "\\end{bmatrix}\n",
    "\\begin{bmatrix}\n",
    "y^{(pj)} - \\mu_p \\\\\n",
    "z^{(pj)} - \\nu_j \\\\\n",
    "x^{(pj)} - \\mu_p - \\nu_j \\\\\n",
    "\\end{bmatrix} \\bigg)\n",
    "\\\\\n",
    "&= - \\frac{1}{2} \\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} \n",
    "\\bigg( \\frac{1}{\\tau_j^2} - \\frac{1}{\\tau_j^4}(z^{(pj)} - \\nu_j)^2 \\bigg)\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Setting it to zero, we get\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\tau_j^2 = \\frac{\\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} (z^{(pj)} - \\nu_j)^2}{\\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})}}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### To summarize the updates\n",
    "\n",
    "for each paper $i$, and each reviewer $j$,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\mu_i &= \\frac{\\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} \\bigg(\\frac{y^{(ir)}}{\\sigma_i^2} - \\frac{2 (x^{(ir)} - \\nu_r)}{\\sigma^2}\\bigg)}{\\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} (\\frac{1}{\\sigma_i^2} - \\frac{2}{\\sigma^2})}\n",
    "\\\\\n",
    "\\sigma_i^2 &= \\frac{\\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} (y^{(ir)} - \\mu_i)^2}{\\sum_{r=1}^{R} \\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})}}\n",
    "\\\\\n",
    "\\nu_j &= \\frac{\\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})}\\bigg(\\frac{z^{(pj)}}{\\tau_j^2} - \\frac{2(x^{(pj)} - \\mu_p)}{\\sigma^2} \\bigg)}{\\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})}(\\frac{1}{\\tau_j^2} - \\frac{2}{\\sigma^2})}\n",
    "\\\\\n",
    "\\tau_j^2 &= \\frac{\\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} (z^{(pj)} - \\nu_j)^2}{\\sum_{p=1}^{P} \\sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})}}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The derivations of $\\mu_i$ and $\\nu_j$, $\\sigma_i^2$ and $\\tau_j^2$ are analogous, given one derivation, the other one is apparent."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Interpretation of the updates:\n",
    "\n",
    "(TODO) Is $\\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} = 1$?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Remark*: the other way for writing the lower bound of log likelihood is to split probability, but this results into the product of two multivariate normal distributions. The math may look more hairer. In contrast to Mixture of Gaussian application in http://cs229.stanford.edu/notes/cs229-notes8.pdf, the prior here ($p(y^{(pr)}, z^{(pr)})$) follows a bivariate normal distributions instead of simpler categorical distribution (generalization of Bernoulli distribution to multiple classes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "l(\\mu_p, \\nu_r, \\sigma_p^2, \\tau_r^2)  \n",
    "&= \n",
    "\\sum_{p=1}^{P} \\sum_{r=1}^{R} \\sum_{y,z}Q_{pr}(y^{(pr)}, z^{(pr)})\\log \\frac{p(y^{(pr)}, z^{(pr)}, x^{(pr)})}{Q_{pr}(y^{(pr)}, z^{(pr)})}\n",
    "\\\\ \n",
    "&=\n",
    "\\sum_{p=1}^{P} \\sum_{r=1}^{R} \\sum_{y,z} w_{(y^{(pr)}, z^{(pr)})} \\log \\frac{p(x^{(pr)}|y^{(pr)}, z^{(pr)}) p(y^{(pr)}, z^{(pr)})}{w_{(y^{(pr)}, z^{(pr)})}}\n",
    "\\\\\n",
    "&=\n",
    "\\sum_{p=1}^{P} \\sum_{r=1}^{R} \\sum_{y,z} w_{(y^{(pr)}, z^{(pr)})} \\bigg (\\log p(x^{(pr)}|y^{(pr)}, z^{(pr)}) + \\log p(y^{(pr)}, z^{(pr)}) - \\log w_{(y^{(pr)}, z^{(pr)})} \\bigg)\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
